Uasin Gishu County
RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyan Code-Switched Dataset
Etori, Naome A., Gini, Maria L.
Social media has become a crucial open-access platform for individuals to express opinions and share experiences. However, leveraging low-resource language data from Twitter is challenging due to scarce, poor-quality content and the major variations in language use, such as slang and code-switching. Identifying tweets in these languages can be difficult as Twitter primarily supports high-resource languages. We analyze Kenyan code-switched data and evaluate four state-of-the-art (SOTA) transformer-based pretrained models for sentiment and emotion classification, using supervised and semi-supervised methods. We detail the methodology behind data collection and annotation, and the challenges encountered during the data curation phase. Our results show that XLM-R outperforms other models; for sentiment analysis, XLM-R supervised model achieves the highest accuracy (69.2\%) and F1 score (66.1\%), XLM-R semi-supervised (67.2\% accuracy, 64.1\% F1 score). In emotion analysis, DistilBERT supervised leads in accuracy (59.8\%) and F1 score (31\%), mBERT semi-supervised (accuracy (59\% and F1 score 26.5\%). AfriBERTa models show the lowest accuracy and F1 scores. All models tend to predict neutral sentiment, with Afri-BERT showing the highest bias and unique sensitivity to empathy emotion. https://github.com/NEtori21/Ride_hailing
- Africa > Kenya > Nairobi City County > Nairobi (0.07)
- Africa > Kenya > Nairobi Province (0.06)
- Africa > Kenya > Mombasa County > Mombasa (0.05)
- (18 more...)
- Transportation > Passenger (1.00)
- Information Technology (1.00)
- Transportation > Ground > Road (0.93)
Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election
Mondini, Roberto, Kotonya, Neema, Logan, Robert L. IV, Olson, Elizabeth M, Lungati, Angela Oduor, Odongo, Daniel Duke, Ombasa, Tim, Lamba, Hemank, Cahill, Aoife, Tetreault, Joel R., Jaimes, Alejandro
Online reporting platforms have enabled citizens around the world to collectively share their opinions and report in real time on events impacting their local communities. Systematically organizing (e.g., categorizing by attributes) and geotagging large amounts of crowdsourced information is crucial to ensuring that accurate and meaningful insights can be drawn from this data and used by policy makers to bring about positive change. These tasks, however, typically require extensive manual annotation efforts. In this paper we present Uchaguzi-2022, a dataset of 14k categorized and geotagged citizen reports related to the 2022 Kenyan General Election containing mentions of election-related issues such as official misconduct, vote count irregularities, and acts of violence. We use this dataset to investigate whether language models can assist in scalably categorizing and geotagging reports, thus highlighting its potential application in the AI for Social Good space.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Africa > Kenya > Bomet County > Bomet (0.05)
- (34 more...)
Farmer.Chat: Scaling AI-Powered Agricultural Services for Smallholder Farmers
Singh, Namita, Wang'ombe, Jacqueline, Okanga, Nereah, Zelenska, Tetyana, Repishti, Jona, K, Jayasankar G, Mishra, Sanjeev, Manokaran, Rajsekar, Singh, Vineet, Rafiq, Mohammed Irfan, Gandhi, Rikin, Nambi, Akshay
Small and medium-sized agricultural holders face challenges like limited access to localized, timely information, impacting productivity and sustainability. Traditional extension services, which rely on in-person agents, struggle with scalability and timely delivery, especially in remote areas. We introduce Farmer.Chat, a generative AI-powered chatbot designed to address these issues. Leveraging Generative AI, Farmer.Chat offers personalized, reliable, and contextually relevant advice, overcoming limitations of previous chatbots in deterministic dialogue flows, language support, and unstructured data processing. Deployed in four countries, Farmer.Chat has engaged over 15,000 farmers and answered over 300,000 queries. This paper highlights how Farmer.Chat's innovative use of GenAI enhances agricultural service scalability and effectiveness. Our evaluation, combining quantitative analysis and qualitative insights, highlights Farmer.Chat's effectiveness in improving farming practices, enhancing trust, response quality, and user engagement.
- North America > United States > Texas > Crockett County (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- Africa > Kenya > Nyeri County > Nyeri (0.04)
- (21 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (0.95)
- Research Report > Experimental Study (0.67)
- Health & Medicine (1.00)
- Food & Agriculture > Agriculture (1.00)
- Education (1.00)
Rural Kenyans power West's AI revolution. Now they want more
Naivasha, Kenya – Caroline Njau comes from a family of farmers who tend to fields of maize, wheat, and potatoes in the hilly terrain near Nyahururu, 180 kilometres (112 miles) north of the capital Nairobi. But Njau has chosen a different path in life. Seated in her living room with a cup of milk tea, she labels data for artificial intelligence (AI) companies abroad on an app. The sun rises over the unpaved streets of her neighbourhood as she flicks through images of tarmac roads, intersections and sidewalks on her smartphone while carefully drawing boxes around various objects; traffic lights, cars, pedestrians, and signposts. The designer of the app – an American subcontractor to Silicon Valley companies – pays her 3 an hour.
- Africa > Kenya > Nairobi City County > Nairobi (0.29)
- North America > United States > California (0.26)
- Africa > South Africa (0.06)
- (9 more...)
- Information Technology (0.71)
- Transportation > Ground > Road (0.70)
- Transportation > Infrastructure & Services (0.55)
Edge.org
The conversation is on hold. The Edge community has hit the road... or they're staying home. Preparing for the academic year to begin, wrapping up projects and starting new ones, celebrating with family and friends or contemplating in solitude. After a hiatus, Edge is pleased to revive Summer Postcards: Edgies reporting in from wherever they are and on whatever they're doing, as the dog days wind out and the season comes to a close. As the world slowly returns to a "new normal" with enduring COVID restrictions in the midst of renewed vaccine freedoms, this year's collection is a testament to change (temporary and lasting), a consideration of loss (will travel ever be like it was?), and a celebration of questions (that still need answering). The hammock may be away until next year, but the memories remain. I spent the summer writing and revising the final section of a longish novel I started in 2019. It seems now as though I've been from 1946 to 2021 on my hands and knees. Various lockdowns have been a liberation from obligations and the luggage carousel, and I've never known such sweet and total focus for months on end. We have the luxury of living in the country--no shortage of big skies and moody walks. All our few breaks were in the UK--Scotland, the Lake District, the West country. Even in our remote part of the Lakes, I had to keep on writing--as in photo. The best novel I read this summer was Sandro Veronesi's The Hummingbird. Best non-fiction was Peter Godfrey Smith's Metazoa: Animal Life and the Birth of the Mind. I gave time also to some wonderful novellas--perfect fictional form for you too-busy scientists. IAN MCEWAN is a novelist whose works have earned him worldwide critical acclaim. He is the recipient of the Man Booker Prize for Amsterdam (1998), the National Book Critics' Circle Fiction Award, and the Los Angeles Times Prize for Fiction for Atonement (2003). His most recent novel is Machines Like Me. In 2019, Časlav Brukner and myself were walking on a beach on Lamma Island, near Hong Kong, marvelling together at the astonishing strangeness of quantum phenomena. This summer, the conversation with Časlav has continued on another island, and quite an island: Lesbos, the northern Greek island near the Turkish coast. Lesbos is the place where lyrical poetry was born. Here lived Sappho and Alcaeus.
- Europe > United Kingdom > Scotland (0.24)
- North America > United States > California > Los Angeles County > Los Angeles (0.24)
- Europe > Netherlands > North Holland > Amsterdam (0.24)
- (44 more...)
- Media (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
- Education > Educational Setting (0.93)
- (3 more...)
Capturing the temporal constraints of gradual patterns
Gradual pattern mining allows for extraction of attribute correlations through gradual rules such as: "the more X, the more Y". Such correlations are useful in identifying and isolating relationships among the attributes that may not be obvious through quick scans on a data set. For instance, a researcher may apply gradual pattern mining to determine which attributes of a data set exhibit unfamiliar correlations in order to isolate them for deeper exploration or analysis. In this work, we propose an ant colony optimization technique which uses a popular probabilistic approach that mimics the behavior biological ants as they search for the shortest path to find food in order to solve combinatorial problems. In our second contribution, we extend an existing gradual pattern mining technique to allow for extraction of gradual patterns together with an approximated temporal lag between the affected gradual item sets. Such a pattern is referred to as a fuzzy-temporal gradual pattern and it may take the form: "the more X, the more Y, almost 3 months later". In our third contribution, we propose a data crossing model that allows for integration of mostly gradual pattern mining algorithm implementations into a Cloud platform. This contribution is motivated by the proliferation of IoT applications in almost every area of our society and this comes with provision of large-scale time-series data from different sources.
- North America > United States > California > San Francisco County > San Francisco (0.13)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.13)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- (11 more...)
- Information Technology (1.00)
- Energy (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)
- Health & Medicine > Therapeutic Area > Immunology (0.45)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.67)
Inference for BART with Multinomial Outcomes
Xu, Yizhen, Hogan, Joseph W., Daniels, Michael J., Kantor, Rami, Mwangi, Ann
The multinomial probit Bayesian additive regression trees (MPBART) framework was proposed by Kindo et al. (KD), approximating the latent utilities in the multinomial probit (MNP) model with BART (Chipman et al. 2010). Compared to multinomial logistic models, MNP does not assume independent alternatives and the correlation structure among alternatives can be specified through multivariate Gaussian distributed latent utilities. We introduce two new algorithms for fitting the MPBART and show that the theoretical mixing rates of our proposals are equal or superior to the existing algorithm in KD. Through simulations, we explore the robustness of the methods to the choice of reference level, imbalance in outcome frequencies, and the specifications of prior hyperparameters for the utility error term. The work is motivated by the application of generating posterior predictive distributions for mortality and engagement in care among HIV-positive patients based on electronic health records (EHRs) from the Academic Model Providing Access to Healthcare (AMPATH) in Kenya. In both the application and simulations, we observe better performance using our proposals as compared to KD in terms of MCMC convergence rate and posterior predictive accuracy.
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- Africa > Kenya > Uasin Gishu County > Eldoret (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology > HIV (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Seroprevalence of anti-SARS-CoV-2 IgG antibodies in Kenyan blood donors
By the end of July 2020, Kenya h ad reported only 341 deaths and ∼20,000 cases of COVID-19. This is in marked contrast to the tens of thousands of deaths reported in many higher-income countries. The true extent of COVID-19 in the community was unknown and likely to be higher than reports indicated. Uyoga et al. found an overall seroprevalence among blood donors of 4.3%, peaking in 35- to 44-year-old individuals (see the Perspective by Maeda and Nkengasong). The low mortality can be partly explained by the steep demographics in Kenya, where less than 4% of the population is 65 or older. These circumstances combine to result in Kenyan hospitals not currently being overwhelmed by patients with respiratory distress. However, the imposition of a strict lockdown in this country has shifted the disease burden to maternal and child deaths as a result of disruption to essential medical services. Science , this issue p. [79][1]; see also p. [27][2] The spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Africa is poorly described. The first case of SARS-CoV-2 in Kenya was reported on 12 March 2020, and an overwhelming number of cases and deaths were expected, but by 31 July 2020, there were only 20,636 cases and 341 deaths. However, the extent of SARS-CoV-2 exposure in the community remains unknown. We determined the prevalence of anti–SARS-CoV-2 immunoglobulin G among blood donors in Kenya in April–June 2020. Crude seroprevalence was 5.6% (174 of 3098). Population-weighted, test-performance-adjusted national seroprevalence was 4.3% (95% confidence interval, 2.9 to 5.8%) and was highest in urban counties Mombasa (8.0%), Nairobi (7.3%), and Kisumu (5.5%). SARS-CoV-2 exposure is more extensive than indicated by case-based surveillance, and these results will help guide the pandemic response in Kenya and across Africa. [1]: /lookup/doi/10.1126/science.abe1916 [2]: /lookup/doi/10.1126/science.abf8832
- Africa > Kenya > Nairobi City County > Nairobi (0.26)
- Africa > Kenya > Mombasa County > Mombasa (0.26)
- Africa > Kenya > Kisumu County > Kisumu (0.25)
- (8 more...)
Classification using Ensemble Learning under Weighted Misclassification Loss
Xu, Yizhen, Liu, Tao, Daniels, Michael J., Kantor, Rami, Mwangi, Ann, Hogan, Joseph W.
Binary classification rules based on covariates typically depend on simple loss functions such as zero-one misclassification. Some cases may require more complex loss functions. For example, individual-level monitoring of HIV-infected individuals on antiretroviral therapy (ART) requires periodic assessment of treatment failure, defined as having a viral load (VL) value above a certain threshold. In some resource limited settings, VL tests may be limited by cost or technology, and diagnoses are based on other clinical markers. Depending on scenario, higher premium may be placed on avoiding false-positives which brings greater cost and reduced treatment options. Here, the optimal rule is determined by minimizing a weighted misclassification loss/risk. We propose a method for finding and cross-validating optimal binary classification rules under weighted misclassification loss. We focus on rules comprising a prediction score and an associated threshold, where the score is derived using an ensemble learner. Simulations and examples show that our method, which derives the score and threshold jointly, more accurately estimates overall risk and has better operating characteristics compared with methods that derive the score first and the cutoff conditionally on the score especially for finite samples.
- North America > United States > Texas > Travis County > Austin (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Wisconsin (0.04)
- (6 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology > HIV (0.36)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)